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Abstract 

In this paper we introduce a nonuniform sparsity model and analyze the performance of an opti- 
mized weighted £i minimization over that sparsity model. In particular, we focus on a model where 
the entries of the unknown vector fall into two sets, with entries of each set having a specific proba- 
bility of being nonzero. We propose a weighted £i minimization recovery algorithm and analyze its 
performance using a Grassmann angle approach. We compute explicitly the relationship between the 
system parameters-the weights, the number of measurements, the size of the two sets, the probabilities 
of being nonzero- so that when i.i.d. random Gaussian measurement matrices are used, the weighted 
ii minimization recovers a randomly selected signal drawn from the considered sparsity model with 
overwhelming probability as the problem dimension increases. This allows us to compute the optimal 
weights. We demonstrate through rigorous analysis and simulations that for the case when the support 
of the signal can be divided into two different subclasses with unequal sparsity fractions, the optimal 
weighted £i minimization outperforms the regular £i minimization substantially. We also generalize 
the results to an arbitrary number of classes. 



1 Introduction 

Compressed sensing is an emerging technique of joint sampling and compression that has been recently 
proposed as an ahernative to Nyquist sampling (followed by compression) for scenarios where measure- 
ments can be costly [24 1. The whole premise is that sparse signals (signals with many zero or negligible 
*The results of this paper were presented in part at the International Symposium on Information Theory, ISIT 2009. 



elements over a known basis) can be recovered with far fewer measurements than the ambient dimension 
of the signal itself. In fact, the major breakthrough in this area has been the demonstration that ii 
minimization can efficiently recover a sufficiently sparse vector from a system of underdetermined linear 
equations [2]. ii minimization is usually posed as the convex relaxation of io minimization which solves 
for the sparsest solution of a system of linear equation and is NP hard. 

The conventional approach to compressed sensing assumes no prior information on the unknown 
signal other than the fact that it is sufficiently sparse over a particular basis. In many applications, 
however, additional prior information is available. In fact, in many cases the signal recovery problem 
that compressed sensing addresses is a detection or estimation problem in some statistical setting. Some 
recent work along these lines can be found in which considers compressed detection and estimation, 
[7], which studies Bayesian compressed sensing, and [8] which introduces model-based compressed sensing 
allowing for model-based recovery algorithms. In a more general setting, compressed sensing may be the 
inner loop of a larger estimation problem that feeds prior information on the sparse signal (e.g., its 
sparsity pattern) to the compressed sensing algorithm [101 [TT] . 

In this paper we will consider a particular model for the sparse signal where the entries of the unknown 
vector fall into a number u of classes, with each class having a specific fraction of nonzero entries. The 
standard compressed sensing model is therefore a special case where there is only one class. As mentioned 
above, there are many situations where such prior information may be available, such as in natural images, 
medical imaging, or in DNA microarrays. In the DNA microarrays applications for instance, signals are 
often block sparse, i.e., the signal is more likely to be nonzero in certain blocks rather than in others [9]. 
While it is possible (albeit cumbersome) to study this model in full generality, in this paper we will focus 
on the case where the entries of the unknown signal fall into a fixed number u of categories; in the ith set 
Ki with cardinality rii, the fraction of nonzero entries is pi. This model is rich enough to capture many 
of the salient features regarding prior information. We refer to the signals generated based on this model 
as nonuniform sparse signals. 

A signal generated based on this model could resemble the vector representation of a natural image 
in the domain of some linear transform (e.g. Discrete Fourier Transform, Discrete Cosine Transform, 
Discrete Wavelet Transform, ...) or the spatial representation of some biomedical image, e.g., a brain 
fMRI image. Although a brain fMRI image is not necessarily sparse, the subtraction of the brain image 
at any moment during an experiment from an initial background image of inactive brain mode is indeed a 
sparse signal which, demonstrates the additional brain activity during the specific course of experiment. 
Moreover, depending on the assigned task, the experimenter might have some prior information. For 
example it might be known that some regions of the brain are more likely to be entangled with the 
decision making process than the others. This can be captured in the above nonuniform sparse model 
by considering a higher value pi for the more active region. Similarly, this model is applicable to other 
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problems like network monitoring (see [T7] for an application of compressed sensing and nonlinear esti- 
mation in compressed network monitoring) , DNA microarrays \19\ [20} [21] , astronomy, satellite imaging 
and many more practical examples. 

In this paper we first analyze this model for the case where there are u >2 categories of entries, and 
demonstrate through rigorous analysis and simulations that the recovery performance can be significantly 
boosted by exploiting the additional information. We find a closed form expression for the recovery 
threshold for u = 2. We also generalize the results to the case of u > 2. A further interesting question 
to be addressed in future work would be to characterize the gain in recovery percentage as a function of 
the number of distinguishable classes u. It is worth mentioning that a somewhat similar model for prior 
information has been considered in [5]. There, it has been assumed that part of the support is completely 
known a priori or due to previous processing. A modification of the regular ii minimization based on the 
given information is proven to achieve significantly better recovery guarantees. As will be discussed, this 
model can be cast as a special case of the nonuniform sparse model, where the sparsity fraction is equal 
to unity in one of the classes . Therefore, using the generalized tools of this work, we can explicitly find 
the recovery thresholds for the method proposed in [5]. This is in contrast to the recovery guarantees of 
[5] which are given in terms of the restricted isometry property (RIP). 

The contributions of the paper are the following. We propose a weighted ii minimization approach 

for sparse recovery where the ii norms of different classes (i^j's) are assigned different weights 

(1 < i < n). Clearly, one would want to give a larger weight to the entries with a higher chance of being 

zero and thus further force them to be zeroll] The second contribution is that we explicitly compute the 

relationship between pi, wxi,^, 1 < i < u and the number of measurements so that the unknown signal 

can be recovered with overwhelming probability as n — t- oo (the so-called weak and strong thresholds) for 

measurement matrices drawn from an i.i.d. Gaussian ensemble. The analysis uses the high-dimensional 

geometry techniques first introduced by Donoho and Tanner [U [3] (e.g., Grassmann angles) to obtain 

sharp thresholds for compressed sensing. However, rather than the neighborliness condition used in 

[U |3] , we find it more convenient to use the null space characterization of Xu and Hassibi [H [16] . The 

resulting Grassmannian manifold approach is a general framework for incorporating additional factors 

into compressed sensing: in [4J it was used to incorporate approximately sparse signals; here it is used 

to incorporate prior information and weighted £i optimization. Our analytic results allow us to precisely 

compute the optimal weights for any Pi,ni, 1 < i < u. We also provide certain robustness conditions for 

the recovery scheme for compressible signals or under model mismatch. We present simulation results to 

show the advantages of the weighted method over standard ii minimization. Furthermore, the results 

of this paper for the case of two classes {u = 2) builds a rigid framework for analyzing certain classes of 

somewhat related method that uses weighted £i optimization is by Candes et al. [10]. The main difference is that 
there is no prior information. At each step, the £i optimization is re-weighted using the estimate of the signal obtained in 
the last minimization step. 
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re- weighted ii minimization algorithms. In a re- weighted ii minimization algorithm, the post processing 
information from the estimate of the signal at each step can be viewed as additional prior information 
about the signal, and can be incorporated into the next step as appropriate weights. In a further work 
we have been able to analytically prove the threshold improvement in a reweighted ii minimization using 
this framework [18] . It is worth mentioning that we have prepared a software package based on the results 
of this paper for threshold computation using weighted ii minimization, and it is available in [23j . 

The paper is organized as follows. In the next section we briefly describe the notations that we 
use throughout the paper. In Section [3] we describe the model and state the principal assumptions of 
nonuniform sparsity that we are interested in. We also sketch the objectives that we are shooting for and, 
clarify what we mean by recovery improvement in the weighted ii case. In Section [H we skim through 
our critical theorems and try to present the big picture of the main results. Section [5] is dedicated to the 
concrete derivation of these results. In Section [6l we briefly introduce the reweighted ii minimization 
algorithm, and provide some insights in how the derivations of this work can be used to analyze the 
improved recovery thresholds. In Section [7] some simulation results are presented and are compared to 
the analytical bounds of the previous sections. The paper ends with a conclusion and discussion of future 
work in Section [HI 

2 Basic Definitions and Notations 

Throughout the paper, vectors are denoted by small boldface letters x, w, z, • • • , scalars are shown by 
small regular letters a,b,a, ■ ■ ■ , and matrices are denoted by bold capital letters(A, I, • • • ). For referring 
to geometrical objects and subspaces, we use Calligraphic notation, e.g. Z,T,Q,V,C, • • • . This includes 
the notations that we use to indicate the faces of a high dimensional polytope, or the polytope itself. 
Sets and random variables are denoted by regular capital letters(i^, S, - ■ ■). The normal distribution with 
mean and variance o"^ is denoted by J\f{^,a'^). For functions we use both little and capital letters and 
it should be generally clear from the context. We use the phrases RHS and LHS as abbreviations for 
Right Hand Side and Left Hand Side respectively throughout the paper. 

Definition 1. A random variable Y is said to have a Half Normal distribution HN{0,a'^) if Y = \X\ 
where X is a zero mean normal variable X ~ AA(0,(T^). 

3 Problem Description 

We first define the signal model. For completeness, we present a general definition. 

Definition 2. Let /C = {i^i, K2, K^} be a partition of {1, 2, • • • , n}, i.e. (Ki D Kj = 9 for i 7^ j, and 
[j^^i Ki = {1, 2, n}^, and P = {pi,P2r" ^Pu} be a set of positive numbers in [0,1]. A nxl vector 
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Fi gur6 1: Illustration of a nonuniformly sparse signal. 



X = {xi,X2,--- ,Xn)'^ is said to be a random nonuniformly sparse vector with sparsity fraction pi 
over the set Ki for 1 < i < u, if x is generated from the following random procedure: 

• Over each set Ki, 1 < i < u, the set of nonzero entries of x is a random subset of size pi\Ki\. In 
other words, a fraction pi of the entries are nonzero in Ki. pi is called the sparsity fraction over Ki. 
The values of the nonzero entries of x can arbitrarily be selected from any symmetric distribution. 
We can choose M{0, 1) for simplicity. 

In Figure [H a sample nonuniformly sparse signal with Gaussian distribution for nonzero entries is 
plotted. The number of sets is considered to be u = 2 and both classes have the same size ^, with 
n = 1000. The sparsity fraction for the first class Ki is pi = 0.3, and for the second class K2 is p2 = 0.05. 
In fact, the signal is much sparser in the second half than it is in the first half. The advantageous feature 
of this model is that all the resulting computations are independent of the actual distribution on the 
amplitude of the nonzero entries. However, as expected, it is not independent of the properties of the 
measurement matrix. We assume that the measurement matrix A is a m x n matrix with i.i.d. standard 
Gaussian distributed AA(0, 1) entries, with ^ = 6 < 1. The measurement vector is denoted by y and 
obeys the following: 



As mentioned in Section [H ii minimization can recover a randomly selected vector x with k = 
nonzero entries with high probability, provided ^ is less than a known function of 6. ii minimization has 
the following form: 



The reference [l] provides an explicit relationship between fi and the minimum 5 that guarantees success 
of £1 minimization recovery in the case of Gaussian measurements and provides the corresponding nu- 
merical curve. The optimization in ([2]) is a linear program and can be solved polynomially fast {0{n^)). 



y = Ax. 



(1) 



min ||x||i. 

Ax=y 



(2) 



However, it fails to encapsulate additional prior information of the signal nature, might there be any such 
information available. One can simply think of modifying ([2]) to a weighted ii minimization as follows: 



The index, w, on the norm is an indication of the n x 1 positive weight vector. Now the questions are i) 
what is the optimal set of weights for a certain set of available prior information?, and ii) can one improve 
the recovery threshold using the weighted ii minimization of ([3]) by choosing a set of optimal weights? 
We have to be more clear with our objective at this point and clarify what we mean by improving the 
recovery threshold. Generally speaking, if a recovery method can reconstruct all signals of a certain model 
with certainty, then that method is said to be strongly successful on that signal model. If we have a 
class of models that can be identified with a parameter 9, and if for all models corresponding to ^ < 
a recovery scheme is strongly successful, then the threshold is called a strong recovery threshold 
for the parameter 9. For example, for fixed ^, if /c < n is sufficiently small, then li minimization can 
provably recover all fc-sparse signals, provided that appropriate linear measurements have been made from 
the signal. The maximum such k is called the strong recovery threshold of the sparsity for the success of ii 
minimization. Likewise, for a fixed ratio fx = ^, the minimum ratio of measurements to ambient dimension 
^ for which, £i minimization always recovers fc-sparse signals from the given m linear measurements is 
called the strong recovery threshold for the number of measurements for ii minimization. In contrast, 
one can also look into the weak recovery threshold, defined as the threshold below which, with very 
high probability a random vector generated from the model is recoverable. For the nonuniformly sparse 
model, the quantity of interest is the overall sparsity fraction of the model defined as ( ^'"^^'"' )■ The 
question we ask is whether by adjusting Wj's according to pi's one can extend the strong or weak recovery 
threshold for sparsity fraction to a value above the known threshold of £i minimization. Equivalently, 
for given classes Ki, ■ ■ ■ ,Ku and sparsity fractions pj's, how much can the strong or weak threshold be 
improved for the minimum number of required measurements, as apposed to the case of uniform sparsity 
with the same overall sparsity fraction. 

4 Summary of Main Results 

We state the two problems more formally using the notion of recovery thresholds that we defined in the 
previous section. We only consider the case of u = 2. 

• Problem 1 Consider the random nonuniformly sparse model with two classes Ki , K2 of cardinalities 
rii = 7in and n2 = 'j2n respectively, and given sparsity fractions pi and p2- Let w be a given weight 
vector. As n —7- 00, what is the weak (strong) recovery threshold for 5 = ^ so that a randomly 
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(3) 
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chosen vector (all vectors) xq selected from the nonuniformly sparse model is successfully recovered 
by the weighted ii minimization of ([3]) with high probability? 

Upon solving Problem. 1, one can exhaustively search for the weight vector w that results in the minimum 
recovery threshold for 5. This is what we recognize as the optimum set of weights. So the second problem 
can be stated as: 

• Problem 2 Consider the random nonuniformly sparse model defined by classes Ki , K2 of cardi- 
nalities rii and n2 respectively, with 71 = 7^ and 72 = -7^, and given sparsity fractions pi and p2- 
What is the optimum weight vector w in ([3]) that results in the minimum number of measurements 
for almost sure recovery of signals generated from the given random nonuniformly sparse model? 

We will fully solve these problems in this paper. We first connect the misdetection event to the properties 
of the measurement matrix. For the non-weighted case, this has been considered in [16] and is known as 
the null space property. We generalize this result to the case of weighted £1 minimization, and mention 
a necessary and sufficient condition for ([3]) to recover the original signal of interest. The theorem is as 
follows 

Theorem 4.1. For allnxl vectors x* supported on the set K C {1, 2, n}, x* is the unique solution to 
the linear program mmAx=y'}2i^=i'^i\^i\ withy = A:k*, if and only if for every vector z = {zi,Z2,--- ,Zn)'^ 
in the null space of A, the following holds: ^ii^KWi\zi\ > YlieK''^i\^i\- 

This theorem will be proved in Section O As will be explained in Section 15.11 Theorem 14.11 along with 
known facts about the null space of random Gaussian matrices, help us interpret the probability of 
recovery error in terms of a high dimensional geometrical object called the complementary Grassmann 
angle; namely the probability that a uniformly chosen (n — m)-dimensional subspace Z shifted by a point 
X of unity weighted £i-norm, Y17=i '^i^i — 1' intersects the weighted £i-ball "P^ = {y S M" | Y^^=i '^i\yi\ ^ 
1} nontrivially at some other point besides x. The shifted subspace is denoted by ^ -|- x. The fact that 
we can take for granted, without explicitly proving it, is that due to the identical marginal distribution 
of the entries of x in each of the sets Ki and K2^ the entries of the optimal weight vector take at most 
two (or in the general case u) distinct values and WK2 depending on their index. In other words 

. if i ^ Ki 

Vie{l,2,--- ,n} Wi={ ' (4) 

WK2 if i& K2 

Leveraging on the existing techniques for computing the complementary Grassmann angle [14^ [TS] , we 
will be able to state and prove the following theorem along the same lines, which upper bounds the 
probability that the weighted £1 minimization does not recover the signal. Please note that in the following 
theorem, the rigorous mathematical definitions to some of the terms (internal angle and external angle) 
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is not presented, due to the extent of descriptions. They wih however be defined rigorously later in the 
derivations of the main results in Section [5l 

Theorem 4.2. Let Ki and K2 he two disjoint subsets o/{l,2,--- ,n} such that \Ki\ = ni,|i^2| = 1^2, 
and pi and p2 be real numbers in [0, 1]. Also, let ki = piui, k2 = P2'^2; <ind E be the event that a random 
nonuniformly sparse vector xq (Definition\^ with sparsity fractions pi and p2 over the sets Ki and K2 
respectively is recovered via the weighted ii minimization of (0j with y = Axq. Also, let denote the 
complement event of E. Then 

< (^"^ - (^"^ - m,k2\tiMati + h,t, + k,) (5) 

< ^1 5; ^1 ~ ^1 
< t2 < n2 — ^2 
t-L + t2 > m — ki — k2 + 1 

where I3{ki,k2\ti,t2) is the internal angle between a {ki + k2 — 1) -dimensional face J- of the weighted 
li-ball Vvi = {y € X^iLi^jl^il — 1} with ki vertices supported on Ki and k2 vertices supported on 
K2, and another (ki + k2 + ti + t2 — 1)- dimensional face Q that encompasses J- and has ti + ki vertices 
supported on Ki and the remaining t2 + ^2 vertices supported on i^2- C(^i)'^2) is the external angle 
between a face Q supported on set L with \L n Ki\ = di and \L n K2\ = d2 and the weighted ii-ball V-w- 
See Section \5.1\ for the definitions of integral and external angles. 

The proof of this theorem will be given in Section 15.21 We are interested in the regimes that make 
the above upper bound decay to zero as n — t- 00, which requires the cumulative exponent in ([5]) to be 
negative. We are able to calculate sharp upper bounds on the exponents of the terms in ([5|) by using 
large deviations of sums of normal and half normal variables. More precisely, for small enough e, if we 
assume that the sum of the terms corresponding to particular indices ti and t2 in ^ is denoted by 
F{tiji2)j and define ti = ^ and T2 = ^, then we are able to find and compute an exponent function 
tptot{n,T2) = 'ipcom{Ti,T2) - ipintin,r2) - ipext{Ti,T2) SO that ^ log F(t i , ^2) ~ V'tot (n , ^2) as n -)> oo. The 
terms ipcom{--, ■)■, fpint{-, •) and Tpext{-, •) are contributions to the cumulative exponent tptot by the so called 
combinatorial, internal angle and external angle terms respectively, existing in the upper bound ([5]). The 
derivations of these terms will be elaborated in Section [5.2.31 Consequently, we state a key theorem that 
is the implicit answer to Problem 1. 

Theorem 4.3. Let 5 = ^ be the ratio of the number of measurements to the signal dimension, 71 = ^ 
and 72 = For fixed values 0/71, 72, pi, P2, ^ = ^^7^ , define E to be the event that a random 
nonuniformly sparse vector xq (Definition\B\) with sparsity fractions pi and p2 over the sets Ki and K2 
respectively with \Ki\ = 7in and \K2\ = 72 is recovered via the weighted ii minimization of ^ with 
y = Axq. There exists a critical threshold 6c = 6c{'yi,j2,Pi,P2,^) such that if S = ^ > 5c, then F{E^} 



8 



decays exponentially to zero as n — )• oo. Furthermore, 5c is given by 



5c = min{5 | V^comln, T2) - V^mtln, T2) - V'ea;t(Tl, ^2) < V < Ti < 71 (1 - pi), 

< r2 < 72(1 - P2), Tl +T2> 5 - 7ipi - 72P2} 



where tpcom, i^int and ipext o,i"^ obtained from the following expressions: 

Define g{x) = -^e~^^ , G{x) = e~^^dy and let ip{.) and <!>(.) be the standard Gaussian pdf and cdf 

functions respectively. 

1. (Combinatorial exponent) 

V'com(Ti,T2)= (^iil-pi)H{—-^ ) + j^(^l-p^)H{—-^ )+ri+T2)log2 (6) 

V 7i(l-Pi) 72(1 -P2) J 

where H{-) is the entropy function defined by H{x) = — xlogx — (1 — x) log(l — x). 

2. (External angle exponent) Define c = (ti + 7ipi) + a;^(r2 + 72P2), oi = 7i(l — Pi) — ti and 
0^2 = 72(1 — P2) — T2. Let xq be the unique solution to x of the following: 

2^ _ 9ix)ai _ iog{ujx)a2 ^ ^ 
xG{x) xG{u}x) 

Then 

V'ext(Ti, T2) = cxq - ai log G(xo) - a2 log G(wxo) (7) 

3. (Internal angle exponent) Let b = ^' = JiPi +u)^j2P2 and Q{s) = (^ri+T2)l{s) + {ri+T2mL) ■ 
Define the function M[s) = "g^jy and solve for s in M{s) = (n +-^)h+n' • unique solution 
be s* and set y = s*{b— Compute the rate function A*{y) = sy — :^-q^Ai(s) — ^.^^.^ Ai (ujs) 

2 

at the point s = s* , where Ai(s) = ^ + log(2<I>(s)). The internal angle exponent is then given by: 

i^int{ri,T2) = {A*{y) + ^±^y2 ^log2)(Ti +T2) (8) 

Theorem 14.31 is a powerful result, since it allows us to find (numerically) the optimal set of weights 
for which the fewest possible measurements are needed to recover the signals almost surely. To this 
end, for fixed values of 71, 72, pi and p2, one should find the ratio for which the critical threshold 
'^c(7i;72)Pi)l'2; — ^) from Theorem 14.31 is minimum. We discuss this by some examples in Section [71 A 
generalization of theorem 14.31 for a nonuniform model with an arbitrary number of classes {u > 2) will be 
given in Section | 
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(a) pi = 0.4, p2 = 0.05. 

Figure 2: (5c as a function of t 



(b) pi = 0.65, P2 = 0.1. 
for 7i = 72 = 0.5. 



As mentioned earlier, using Theorem 14.31 it is possible to find the optimal ratio ^jj^- It however 
requires an exhaustive search over the 6c threshold for all possible values of oj. For 71 = 72 = 0.5, 
pi = 0.3 and p2 = 0.05, we have numerically computed Sc('yi,j2,Pi,P2,-!!r^) as a function of and 
depicted the resulting curve in Figure [2al This suggests that ^jy^ ~ 2.5 is the optimal ratio that one can 
choose. Later we will confirm this using simulations. 

Note that 5c given in Theorem 14.31 is a weak bound on the ratio 6 = In other words, it determines 
the minimum number of measurements so that for a random sparse signal from the nonuniform sparse 
model and a random support set, the recovery is successful with high probability. It is possible to obtain 
a strong bound for 6, using a union bound on all possible support sets in the model, and all possible 
sign patterns of the sparse vector. Similarly, a sectional bound can be defined which accounts for all 
possible support sets but almost all sign patterns. Therefore, the expressions for the strong and sectional 
thresholds, which we denote by 6c and 6c are very similar to 6c in Theorem 14.31 except for a slight 
modification in the combinatorial exponent term ipcom- This will be elaborated in Section [5.2.31 

It is worthwhile to consider some asymptotic cases of the presented nonuniform model and some of 
their implications. First of all, when one of the subclasses is empty, e.g. 71 = 0, then the obtained weak 
and strong thresholds are equal to the corresponding thresholds of ii minimization for a sparsity fraction 
p = P2- Furthermore, if the sparsity fractions pi and p2 over the two classes are equal, and a unitary 
weight cj = 1 is used, then the weak threshold 6c is equal to the threshold of ii minimization for a sparsity 
fraction p = pi = P2- In other words: 



Sc{ii,'y2,p,P, 1) = 6c{0, 1,0, p, 1) 



(9) 
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This follows immediately from the derivations of the exponents in Theorem I4.3i However, the latter 
is not necessarily true for the strong threshold. In fact the computation of the strong threshold for 
regular ii minimization involves a union bound over a larger set of possible supports, and therefore the 
combinatorial exponent becomes larger. Therefore: 

<5fH7i,72,P,P,l) <5fH0,l,0,p,l). (10) 

A very important asymptotic case is when the unknown signal is fully dense over one of the subclasses, 
e.g. pi = 1, which accounts for a partially known support. This model is considered in the work of Vaswani 
et al. [5] , with the motivation that in some applications (or due to previous processing steps) , part of the 
support set can be fully identifiecj^. If the dense subclass is Ki and K2 = K^, then [5] suggests solving 
the following minimization program: 

min llx^alli- (11) 

Ax=y 

It is possible to find exact thresholds for the above problem using the weighted ii minimization machinery 
presented in this paper. First, note that (jlip is the asymptotic solution of the following weighted ii 
minimization, when uj ^ 00 

min ||x/f J|i + a;||xi<-2 111. (12) 

Ax=y 

Therefore the recovery threshold for (jlip can be given by (5c(7i,72, ^,P2,^) for uj — )■ 00. We prove the 
following theorem about the latter threshold: 

Theorem 4.4. If uj ^ 00, then (5c(7i, 72, l,P2i f^) 7i + 72<Jc(0, 1, 0,^2, !)• In other words, when a 
subset of entries of size 7in are known to be nonzero, the minimum number of measurements that is 
required for almost surely successful recovery using is equal to the total number of measurements 

needed if we were allowed to independently make measurements from the two parts and recover each using 
£1 minimization. 

The proof of this theorem is given in Appendix [El 

A very important factor regarding the performance of any recovery method is its robustness. In other 
words, it is important to understand how resilient the recovery is in the case of compressible signals or 
in the presence of noise or model mismatch(i.e. incorrect knowledge of the the sets or sparsity factors). 
We address this in the following theorem. 

Theorem 4.5. Let Ki and K2 be two disjoint subsets o/{l,2,--- ,n}, with \Ki\ = 7in,|-K'2| = 72?^ 

and 71+72 = 1- Also suppose that the dimensions of the measurement matrix A satisfy 6 = — > 
(s) 

(71 1 72)^1)^2) '^) for positive real numbers pi and p2 in [0, 1] and 00 > 0. For positive €1,62, assume 
^Thanks to anonymous reviewers for pointing this out to us! 
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that Li and L2 are arbitrary subsets of Ki and K2 with cardinalities (1 — ei)"fipin and (1 — e2)l2P2n 
respectively. With high probability, for every vector -k.^, ifx is the solution to the following linear program: 

mill ||x/^J|i + (^llxxalli- (13) 

Ax=Axo 

Then the following holds 

||(xo +w||(xo -x)i^2||i < Cei,,2 (ll(xo)xrni^i 111 +^11(^0 )i;ni^2 111) ' (1^) 

where 

l + min(-fi£^,^) 

_ V 1-pi ' 1-P2 I 

1 — mm , 

V 1-pi ' I-P2 ' 

The above theorem has the fohowing imphcations. First, if xq is a (compressible) vector, such that 
its "significant" entries fohow a nonuniform sparse model, then the recovery error of the corresponding 
weighted l\ minimization can be bounded in terms of the l\ norm of the "insignificant" part of xo(i.e. the 
part where a negligible fraction of the energy of the signal is located or most entries have significantly small 
values, compared to the other part that has an overall large norm). Theorem 14.51 can also be interpreted 
as the robustness of weighted i\ scheme to the model mismatch. If K2,pi,P2 are the estimates of an 
actual nonuniform decomposition for xq (based on which the minimum number of required measurements 
have been estimated), then the recovery error can be relatively small if the model estimation error is slight. 
Theorem 14.51 will be proved in Section 15.41 

5 Derivation of the main results 

In this section we provide detailed proofs to the claims of Section [H Let xq be a random nonuniformly 
sparse signal with sparsity fractions pi and p2 over the index subsets Ki and K2 respectively (Definition 
[2]), and let \Ki\ = ni and \K2\ = n2 ■ Also let K be the support of x. Let E be the event that x is 
recovered exactly by ([3]), and E'^ be its complimentary event. In order to bound the conditional error 
probability P{£''^} we adopt the idea of [16] to interpret the failure recovery event (E^) in terms of the 
null space of the measurement matrix A. This is stated in Theorem 14.11 which we prove here. 

proof of Theorem \4.1\ Suppose the mentioned null space condition holds and define x = orgmm Ax=y^iLi "Wilxi 
Let W = diag{wi,W2, • • • ,Wn)- By triangular inequality, we have: 
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||Wx||i = ||(Wx)i^||i + ||(Wx)^||i = ||(Wx*+Wx-Wx*)^||i + ||(Wx);^||i 

> ||(Wx*)x||i - ||(Wx - Wx*),^||i + ||(Wi - Wx*y li 

> ||Wx*||i 

Where the last inequality is a result of the fact that x — x* is in the null space of A and satisfies the 
mentioned null space condition. However, by assumption if x 7^ x* then ||Wx||i < ||Wx*||i. This implies 
that X = X*. Conversely, suppose there is some vector z in AA(A) such that ||(Wz)i^||i > ||(Wz)-j^||i. 
Taking define x* = (z/^ 0)"^ and x = (0 z-^)-^ implies that Ax* = Ax and ||Wx*||i > ||Wx||i. Therefore, 
X* cannot be recovered from the weighted £1 minimization. ■ 

From this point on, we follow closely the steps towards calculating the upper bound on the failure 
probability from [4J, but with appropriate modifications. The key to our derivations is the following 
lemma which will be proven in Appendix lAl 

Lemma 5.1. For a certain subset K C {l,2,...,n} with \K\ = k, the event that the null-space N{A) 
satisfies 

'^Wi\zi\ <'^Wi\zi\,'i2 e MiA), (15) 
is equivalent to the event that for each x supported on the set K (or a subset of K) 

'^Wi\xi + Zi\ + '^Wi\zi\ > ^t(7j|xi|,Vz G M{A). (16) 

i£K jgT^ i£K 

5.1 Upper Bound on the Failure Probability 

Knowing Lemma |5. 11 we are now in a position to derive the probability that condition (jl5|) holds for a 
support set K with \K\ = k, if we randomly choose an i.i.d. Gaussian matrix A. In the case of a random 
i.i.d. Gaussian matrix, the distribution of null space of A is right-rotationally invariant, and sampling 
from this distribution is equivalent to uniformly sampling a random (n — m)-dimensional subspace Z 
from the Grassmann manifold Gr(„_m)(n). The Grassmann manifold Gr(„_m)(n) is defined as the set 
of all {n — m)-dimensional subspaces of M". We need to upper bound the complementary probability 
P = ¥{E'^}, namely the probability that the (random) support set i^T of x (of random sign pattern) fails 
the null space condition (fT6|) . We denote the null space of A by Z. Because Z is a linear space, for every 
vector z G Z, az is also in Z for all a G M. Therefore, if for a z G Z and x condition (I16p fails, by a 
simple re-scaling of the vectors, we may assume without loss of generality that x lies on the surface of 
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(a) 



(b) 



Figure 3: A weighted ^1-ball, Pw, in K'' (a), and a linear hyperplane Z passing through a point x in the interior of a one dimensional 
face of T'w (b). 

any convex ball that surrounds the origin. Therefore we restrict our attention to those vectors x from 
the weighted ^i-sphere: 

n 

{x G M" I ^Wi\xi\ = 1} 
i=l 

that are only supported on the set , or a subset of it. Since we are assuming that the distribution of 
the nonzero entries of x is symmetric, we can write: 

P = Pk,- (17) 

where Pr- is the probability that for a specific support set K , there exist a fc-sparse vector x of a specific 
sign pattern which fails the condition (jl6p . By symmetry, without loss of generality, we assume the signs 
of the elements of x to be non-positive. Now we can focus on deriving the probability Pk,-- Since x is 
a non-positive /c-sparse vector supported on the set K and can be restricted to the weighted £i-sphere 
{x G M" I J2l Wilxil — 1}, X is also on a (A; — l)-dimensional face, denoted by J^, of the weighted ^i-ball 

n 

7'w = {y GM" I ^Wi\yi\<l} (18) 

i=l 

The subscript w in T^w is an indication of th.6 weight vector w — (^i? * * * ? ^n)"^* Figure shows 
in for some nontrivial weight vector w. Now the probability Px - is equal to the probability that 
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there exists an x G J^, and there exists a z £ Z (z 7^ 0) such that 

Wi\xi + Zjl + ^ Wi\zi\ < ^ Wi\xi\ = 1. (19) 

We start by studying the case for a specific point x £ and, without loss of generahty, we assume x 
is in the relative interior of this {k — l)-dimensional face J-. For this particular x on J^, the probability, 
denoted by P^, that there exists a z G ^ (z / 0) such that 

Wi\xi + Zil + y^ Wi\zi\ < y^ Wi\xi\ = 1. (20) 

is essentially the probability that a uniformly chosen (n — m)-dimensional subspace Z shifted by the 
point X, namely {Z + x), intersects the weighted ^i-ball V-w non-trivially, namely, at some other point 
besides x (Figure [3bl) . From the fact that Z is a linear subspace, the event that {Z + x) intersects T^w 
is equivalent to the event that Z intersects nontrivially with the cone Cw(x) obtained by observing the 
weighted £i-ball T^w from the point x. (Namely, Cw(x) is conic hull of the point set ("Pw — x) and of 
course Cw(x) has the origin of the coordinate system as its apex.) However, as noticed in the geometry 
for convex polytopes [121 113] . the cones Cw(x) are identical for any x lying in the relative interior of the 
face J-. This means that the probability Pr,- is equal to P^, regardless of the fact that x is only a single 
point in the relative interior of the face There are some singularities here because x G may not be 
in the relative interior of J^, but it turns out that the Cw(x) in this case is only a subset of the cone we 
get when x is in the relative interior of So we do not lose anything if we restrict x to be in the relative 
interior of the face J-, namely we have 

Pk,- = PL- 

Now we only need to determine P^. From its definition, P^ is exactly the complementary Grassmann 
angle [12] for the face J-' with respect to the polytope V-w under the Grassmann manifold Gr(„„„)(n): 
a uniformly distributed (n — m)-dimensional subspace Z from the Grassmannian manifold Gr(„_„) (n) 
intersecting non-trivially with the cone Cw(x) formed by observing the weighted £i-ball V-w from the 
relative interior point x G J^. 

Building on the works by L.A. Santalo [H] and P. McMullen |15j in high dimensional geometry 
and convex polytopes, the complementary Grassmann angle for the {k — l)-dimensional face J- can be 
explicitly expressed as the sum of products of internal angles and external angles |13] : 

E PiJ',0)CiG,V^), (21) 

s>0Ge3™.+ i+2.(7'w) 

where s is any nonnegative integer, Q is any (m + 1 + 2s)-dimensional face of the (9m+i+2s(^w) is 
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the set of all such faces), /?(•, •) stands for the internal angle and (^(•, •) stands for the external angle, and 
are defined as follows \13\ [T5] : 

• An internal angle /3(J-'i, J-2) is the fraction of the hypersphere S covered by the cone obtained by 
observing the face J-2 from the face J-i. [f| The internal angle J-2) is defined to be zero when 
J^i ^ and is defined to be one if J^i = 

• An external angle C(-^3)-^4) is the fraction of the hypersphere 5 covered by the cone of outward 
normals to the hyperplanes supporting the face at the face T^. The external angle C(J"3, J^) is 
defined to be zero when ^"3 ^ ^"4 and is defined to be one if /"a = T/^. 

In order to calculate the internal and external angles, it is important to use the symmetrical properties 
of the weighted cross-polytope V^. First of all, is nothing but the convex hull of the following set of 
2n vertices in M" 

= conv{±— I 1 < i < n} (22) 

where ej 1 < i < n is the standard unit vector in M" with the ith entry equal to 1. Every (k — 1)- 
dimensional face J- of is simply the convex hull of k of the linearly independent vertices of V^. 
In that case we say that T is supported on the index set K of the k indices corresponding to the 
nonzero coordinates of the vertices of J- in M". More precisely, if = conv{ji^p- , j2^p- , • • • ,jk^j^} with 
ji E {—1, +1} yi < i < k, then T is said to be supported on the set K = {11,12, • • • , ik}- 

5.2 Special Case of m = 2 

The derivations of the previous section were for a general weight vector w. We now restrict ourselves to 
the case of two classes, i.e. u = 2, namely Ki and K2 with \Ki\ = ni and \K2\ = n2- For this case, we 
may assume that w[s have the following particular form 



Vi G {1, 2, • • • , n} Wi = < 



WKi if i £ Ki ^ ^ 

(23) 

WK2 if i€ K2 



proof of Theorem \4-^ The choice of w as in (I23p results in having two classes of geometrically identical 
vertices, and many of faces of V-w being isomorphic. In fact, two faces J- and J-' of V-w that are respectively 



supported on the sets K and K' are geometrically isomorphic 3 if \K Ci Ki\ = \K' Ci Ki\ and \K n = 
In other words the only thing that distinguishes the morphology of the faces of V-w is the 



^Note the dimension of the hypersphere S here matches the dimension of the corresponding cone discussed. Also, the 
center of the hypersphere is the apex of the corresponding cone. All these defaults also apply to the definition of the external 
angles. 

^This means that there exists a rotation matrix G R"^" which is unitary i.e. — I, and maps T isometrically to 

i.e. T' = QF. 

^Remember that Ki and K2 are the same sets as defined in the model description of Section [31 
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proportion of their support sets that is located in Ki or K2. Therefore for two faces J- and Q with T 
supported on K and Q supported on L {K C L), I3{F, Q) is only a function of the parameters ki = |i^ni^i|, 
k2 = |-fCnK2|> ki+ti = |LnKi| and k2+ti = |i('nK2|- So, instead of Q) we may write /3(A;i, A;2|ti, t2) 
to indicate the internal angle internal angle between a {ki + k2 — l)-dimensional face J- of with ki 
vertices supported on Ki and k2 vertices supported on K2, and a (A;i + k2 + ti + t2 — l)-dimensional 
face Q that encompasses J- and has ti + ki vertices supported on Ki and the remaining t2 + k2 vertices 
supported on K2. Similarly instead of C(^i'Pw) we write C,{ti + ki,t2 + k2) to denote the external angle 
between a face G supported on set L with | L n Ki | = di and | L n K2 1 = > and the weighted £i-ball ■ 
Using this notation and recalling the formula (|2ip we can write 



Pk,- = 2^ Yl f3{J',G)C{G,V^) 

s>0Ge3™+i+2s(SP) 

E ^'^"'''^'C't^ ^0 C't^ ^')/3(A:i,A:2|ti,t2)C(ti + A:i,t2 + A:2), 

< ii < ni — ki 
< t2 < n2 — k2 
^1+^2 > m — fci — ^2 + 1 

(24) 

where in ()24p we have used the fact that the number of faces G of Vmv of dimension A;i + A;2 + ti +t2 — 1 that 
encompass J- and have fci + ti vertices supported on Ki and its remaining k2 + 12 are vertices supported 
on K2 is 2*i+*2 ("i^:^''^) ("^4"^'') . In fact ^ has ki + k2+ti+ 12 vertices including the ki + k2 vertices of T. 
The remaining ti + t2 vertices can each be independently in the positive or negative orthant, therefore 
resulting in the term 2*i'^*2^ xhe two other combinatorial terms are the number of ways one can choose 
ti vertices supported on the set Ki — K and t2 vertices supported on K2 — K . From ([24ll and (fT7|) we 
can conclude theorem 14. 2[ ■ 

In the following sub-sections we will derive the internal and external angles for a face J-, and a face 
G containing J^, and will provide closed form upper bounds for them. We combine the terms together 
and compute the exponents using the Laplace method in Section 15. 2. 3^ and derive thresholds for the 
negativity of the cumulative exponent. 

5.2.1 Computation of Internal Angle 
Theorem 5.1. Let Z be a random variable defined as 

z = {kiwl^ + k2wl;)x, - wl^ wl^ ^1 ' 

i=l 2=1 
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where Xi ~ N(0, — 2-^--; — t-^) is a normal distributed random variable, X' ~ HN(0, „ \ ) 1 < 
i < ti and X'-' ~ HN{0, r, \ ) 1 < * < ^2 are independent (from each other and from Xi) half 
normal distributed random variables. Let pz{') denote the probability distribution function of Z and 
CO = 2^ ((A;i + ti)wl^ + (/C2 + t2)w]^^y'^. Then 

/3ih,k2\ti,t2) = copz{0) (25) 

We now prove this Theorem. Suppose that is a (/c — l)-dimensional face of the weighted ^i-ball 

n 

7^w = {y GM*^ I ^w^\y.,\ <!} 

i=l 

supported on the subset K with \K\ = k = ki + k2- Let ^ be a / — 1 dimensional face of supported 
on the set L with J- <Z Q. Also, let \L n Ki\ = ki + ti and \L n K2\ = /c2 + t2- 
We first state the following lemma the proof of which is given in Appendix [Bj 

Lemma 5.2. Let F be a {k — 1)- dimensional face of V-w supported on the set = {1, 2, • • • , k}, and Q 
be a I — 1-dimensional face ofV^^ that contains J- and is supported on the set L = {1, 2, • • • , I}. Let Cjr± g 
be the positive cone of all the vectors x G M" that take the form: 

k I 

-'^biCi + ^ biCi, (26) 

i=l i=k+l 

where bi,l < i < I are nonnegative real numbers and 

Eu I, ^1 ^2 bk 

Wibi= } Wibi, — = — = ... = — . 



Then 



I e-ll''ll'(ix = /3(^,g) •7r('-'=)/2. 



(27) 



From (I27p we can find the expression for the internal angle. Define [/ C R' ^^"^ as the set of all 
nonnegative vectors (xi,X2,-"' i^i-k+i) satisfying: 

and define f{xi, • • • , xi^k+i) '■ U — )• Cp± q to be the following linear and bijective map: 

k I 
f{xi,- ■ ■ ,Xl-k+i) = -^^XiWr^r + ^ Xr-k+lWr^r- 

r=l r=k+l 
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Then 



/ e-ll'^'ll' dx' = [ e-II^WII' d/(x) = I J(M)| [ e-H^^^)"' • • • dxi_fc+i 

= I J(M)| J e-'-^r=l^^r)4-T.Lk+l^^r^^r-k + l ' ' ' dXl^k + 1 



(28) 



r is the region described by 

k I 

(^wl)xx= ^ wlxr-k+\, Xr>^ 2<r<l-k + l (29) 

r=l r=k+l 

where | J(M)| is due to the change of integral variables and is essentially the determinant of the Jacobian 
of the variable transform given by the I x (l — k) matrix M below: 



Mm- 



-^Wiwl_^_j I < i < k,l < j < I — k 

Wi k + l<i<l,j = i-k (30) 

Otherwise 



where Q, = J2t=i''^r- "^^^ Jacobian is obtained by |J(M)| = det(M-^M)^/^. By finding the eigenvalues 
of M'^M we obtain: 

I J(M)| = w%w%i L^l_^^)i/2 (31) 

Now we define a random variable 

k I 
r=l r=k+l 

where Xi, X2, ■ ■ ■ are independent random variables, with Xr ~ HN{0, ), 2 < r < 

(I — k + 1) , are half- normal distributed random variables and Xi ~iV(0, — — -)isa normal distributed 
random variable. Then by inspection, (p8]) is equal to cip^(O), where pzi') is the probability density 
function for the random variable Z and pzi^) is the probability density function pzi') evaluated at the 
point Z = 0, and 

^l-k+l I k ^l-k+1 

n ^(E-')'^' 1^(^)1 = ^^li^((fci+^ix+(fc2+^2)0'^' (32) 

Combining these results, the proof of Theorem 15.11 is complete. 
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5.2.2 Computation of External Angle 

Theorem 5.2. The external angle (^{Q, V-w) = C{di, ^^2) between the face Q and V^N , where G is supported 
on the set L with [LCi Ki\ = di and \L n = d2 is given by: 



J™e-y'dy\ U™e-y'dy] dx, (33) 
Where ^^(^1,^2) = ^i^L ""^f = '^i^lci + d2w\^^, ri = ni — di and r2 = n2 — d2. 

Proof. Without loss of generahty, assume that the support set of Q is given hy L = {n — I + l,n — I + 
2, • • • , n} and consider the {I — l)-dimensional face 

g = convj , , , — I 

Wn-l+l Wn-k Wn-k+1 Wn 

of the weighted ^i-bah V. The 2^^"' outward normal vectors of the supporting hyperplanes of the facets 
containing Q are given by 

n—l n 

{^jiWiSi + ^ wiSiji e {-1, 1}}. 

i=l p=n—l+l 

Then the outward normal cone Cgj,^ at the face Q is the positive hull of these normal vectors. Thus 



/ e-\\-\\'dx = aQ,Vy,)Vn-i{S^-') ' 



_ ^ , e r" ' 

= C(e,^w).7^("-'+')/^ (34) 

where T4_i(S'"^') is the spherical volume of the (n — /)-dimensional unit sphere S^~^ . Now define U to 

be the set 

{a; G I Xn-i+i > 0, \xi/wi\ < Xn-i+i, 1 < i < {n - I)} 

and define f{xi, ■ ■ ■ , Xn-i+i) ■ U — >■ Cg-p^ to be the linear and bijective map 

n—l n 
f{xi, ■■■ , Xn-l+l) = '^Xiei+ ^ WiXn-l+iei. 



1=1 i=n—l+l 
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Then 



e-ll-'ll' da;' = |J(M)| / e-"/^")"' 
= |J(M)|/ / ••• / e-"i~--""--(^— dx„_z+i 

.WK^X \ / .WK^X \ 

/ dy / dy dx (35) 

J-WkiX J \J-WK2X J 

M is the n X (n — / + 1) change of variable matrix given by M = [ " ' | , where 

\ WL 

{wn-i+i-,Wn-i+2-, ' ' ' -.Wnf ■ Therefore J(M) = det(M'^M)"'^''^ = {diw\^ +d2w\^Y^'^. Replacing this and 
a change of variable for x (replace with x) in (I35p . along with (]34p . complete the proof. ■ 

5.2.3 Derivation of the Critical Weak and Strong 5c Threshold 

So far we have proved that the probability of the failure event is bounded by the formula 

HE'} < Yl 2*^+*^+i (^"^ - (^"^^~ ^')/3(fci, fc2|ti,i2)C(ii + h,h + k,), (36) 

< ti < — ki 
< t2 < n2 — k2 
ti + t2 > m — ki — k2 + 1 

where we gave expressions for /3{ti,t2\ki, ^2) and C(^i + ^2) ^2) in Sections 15.2.11 and 15 . 2.2\ respectively. 
Now our objective is to show that the R.H.S of (|36p will exponentially decay to as n — t- c«, provided 
that (5 = ^ is greater than a critical threshold 5c, which we are trying to evaluate. To do this end we 
bound the exponents of the combinatorial, internal angle and external angle terms in (|36p . and find the 
values of 5 for which the net exponent is strictly negative. The maximum such 5 will give us 6c- Starting 
with the combinatorial term, we use Stirling approximating on the binomial coefficients to achieve the 
following as n — )• 00 and e — ?■ 

1„„ I nti+t2 + i I ^ J^li ^ ^11 „ \TJl ^ \ \ ^. (^ ^ \TJl ^ 



log 2'^+'^+M \ M % ^ ^ 7i(l-Pi)g( )+72(l-P2)g( J +Ti+r2 log2, 

V \ h J \ t2 J J V 7i(l~Pi) 72(1-^2) / 

(37) 



where ti = ^ and T2 



n ' 



For the external angle and internal angle terms we prove the following two exponents 

1. Let g{x) = ■-^e~^^ G{x) = e'^^dy. Also define c = (n + jipi) + uj'^{t2 + 72P2), "i 

71(1 — pi) — Ti and a2 = 72(1 — P2) — T2- Let xq be the unique solution to x of the following: 

2^ _ g{x)ai _ ujg{ujx)a2 _ ^ 
xG{x) xG{ijJx) 



21 



Define 



i'ext{n,T2) = Cxl - ailogG(xo) - 02 log ^(tJXo) 



(38) 



2. Let h = '^^^'1^^^ and (/?(.) and $(.) be the standard Gaussian pdf and cdf functions respectively. 
Also let = 7iPi+a;272P2 and Q{s) = ^-11^ + , Define the function M{s) = 

T1+T2 



and solve for s in M{s) = (^^_^^^y^_^_Q, ■ Let the unique solution be s* and set y = s*{b — j^^jp^)- 
Compute the rate function A*(y) = sy — ^^^^^ Ai (s) — ^^^^^ Ai((js) at the point s = s* , where 

2 

^i(s) = T ~'~ ^os(2*^'(s))- The internal angle exponent is then given by: 

Ant{ri.T2) = (A*(y) + ^-t^y2 ^log2)(ri +r2). (39) 

We now state the following lemmas, which are proved in Appendix [Q and iDl 
Lemma 5.3. Fix 5, e > 0. There exists a finite number no(5, e) such that 

-log(C(ti + A:i,t2 + A;2)) < -V'ext(n,r2) + e, (40) 
n 

uniformly in < ti < ui — ki, < t2 < n2 — k2 and ti + t2 > rn — ki — k2 + 1, n > no((5, e). 
Lemma 5.4. Fix 5, e > 0. There exists a finite number ni{5,e) such that 

1 



n 



log{/3{ti,t2\ki,k2)) <-'ll^int{n,T2)+€, (41) 



uniformly in < ti < ni — ki, < t2 < n2 — k2 and ti + 12 > m — ki — k2 + i, n > ni{6, e). 

Combining Lemmas 15.31 and 15.41 (jST]) . and the bound in (|36|l we readily get the critical bound for 6c 
as in the Theorem 14.31 

Derivation of the strong and sectional threshold can be easily done using union bounds to account 
for all possible support sets and/or all sign patterns. The corresponding upper bound on the failure 
probability for the strong threshold is given by: 

It then follows that the strong threshold of 6 is given by 5c in Theorem 14.3^ except that the combinatorial 
exponent tpcomi-, •) must be corrected by adding a term 

(TiPi + I2P2 + jiH{pi) + j2H{p2)) log 2, (43) 

to the RHS of ([6]). Similarly, for the sectional threshold, which deals with all possible support sets but 
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almost all sign patterns, the modification in the combinatorial exponent term is as follows: 



(7iiI(pi) + 72i/(p2))log2. (44) 

5.3 Generalizations 

Except for some subtlety in the large deviation calculations, the generalization of the results of the 
previous section to an arbitrary u> 2 classes of entries is straightforward. Consider a nonuniform sparse 
model with u classes Ki, ■ ■ ■ ,Ku where \Ki\ = rii = jiU, and the sparsity fraction over the set Ki is pi, 
and a recovery scheme based on weighted ii minimization with weight Ui for the set Ki. The bound in 
(j2ip is general and can always be used. Due to isomorphism, the internal and external angles (3{J-,Q) 
and C(^)^w) only depend on the number of vertices that the supports of T and Q have in common with 
each Ki. Therefore, a generalization to ^ would be: 

P{i?=}<2 ni<,<„2*-(^"'^"^'j/?(k|t)C(t+k) (45) 

< t < n - k 
l^t > m - l^k + 1 

Where t = (ti, • • • , tu)^, k = (fci, • • • , ku)^ and 1 is a vector of all ones. Invoking generalized forms of 
Theorems 15.21 and 15 . 1 1 to approximate the terms /3(k|t) and C(k + t), we conclude the following Theorem. 

Theorem 5.3. Consider a nonuniform sparse model with u classes Ki,--- ,Ku with \Ki\ = rii = 
7in, and sparsity fractions Pi,P2r'' ^Pm where n is the signal dimension. Also, let the functions 
g{.),G{.),Tp{.),"^{.) be as defined in Theorem \4.3[ For positive values {uJi}f^i, the recovery thresholds 
(weakjSectional and strong) of the weighted ii minimization program: 

u 

min V'wjllxii-Jli, 

Ax=y ^ 

1=1 

is given by the following expression: 



Sc = mm{5 1 Vcom(T") - Tpint{r) 


-Axtir) < Vr 




0<Ti<7,(l-pi)Vl< 


u 

i 1^ u, Ti > 5 - 


u 




1=1 


1=1 



where ipcom, ''Pint and ipext clip's obtained from the following expressions: 

1- 'ipcomi'T) = log 2 ^"^-|^ 7j(l — Pi)^( ^.(ilp.) ) f^f the weak threshold. For sectional threshold this 
must be modified by adding a term log 2 ^"^^ 7jff(pj). For strong threshold, it must be also added 
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2- ipextir) = cxl - Yh=i "i log G{uJiXQ), where c = Yh^^ ujfin + 7,^,), at = 7^(1 - pi) - n and xq is 

g{uJiXo)ai 



the unique solution of 2c = Yl^=i ci;^ ^^'^'^"-'"? 



3- ipintiT) = A(A*(y)+2y^u^^-^— +log 2), where A = Yl^=i '^i' andy and A*(y) are obtained as follows. 
Let b = S^^, Q(s) = Let s* be the solution to s in = b + H^i^JiEi^ and 

2^ = "*(^-Mfe)- ThenA*{y) = s*y-l/XE■=lr^{'^+log{2Hu;,s*))) ■ 

5.4 Robustness 

proof of Theorem \4.5\ We first state the following lemma, which is very similar to Theorem 2 of [1]. We 
skip its proof for brevity. 

Lemma 5.5. Let K C {1,2- •• ,n} and the weight vector w = {wi,W2, - ' ' ^w^)^ be fixed. Define 
W = diag{wi,W2,--- ,Wn) and suppose C > 1 is given. For every vector xq € M"^"'^, the solution x of 
^ satisfies 

||W(xo-i)||i <2^±j^u;,|(xo)i|, (46) 
if and only if for every z £ A/'(A) the following holds: 

C'^Wi\zi\ <'^Wi\zi\. (47) 



Let z = (zi, • • • , Zn)"^ be a vector in the null space of A, and assume that 

C Wi\zi\ = Wi\zi\. (48) 

Let -fCej and K^^^ the solutions of the following problems 

: max ^2 (^9) 

K^^ : max ^2 (^0) 

Let L'l = Li U and L'2 = L2D K^2- From the definition of K^-^ and Kf:^, it follows that 

ifii^ii > Y^i^ Wi\zi\, (51) 
Wi\zi\ > ^^^^^ w^ikii- (52) 

-^^562 jeZ^ni^2 
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Adding C' (^YIk^^ Wi\zi\ + TIk^^ Wi\zi\^ to both sides of (jl8|) and using ([5T]) and (j52]) . we can write: 

C" Wilzil > tt;i|zj|+C" ^ ^ w^l^il + -I ^ ^ if^ikil (53) 



ieL[UL'2 iGLinL2 



> l + (C" + l)min(-^,-^) V w,\z,\. (54) 
V 1 - Pi 1 - P2 / ±r^_ 

Note that \L\\ = jipin and 1 = ^2V2'n. Therefore, since 5 = )72,Pi,P2, w), we know that 

SieL'iULJ, ^il-^il < YliizjTf^ Wi\zi\- From this and we conclude that 

C'>fl + (C' + l)min(-^,-^^)), (55) 

V 1 - Pi 1 - P2 / 

or equivalently 

l + niin(f£^,^) 
1 — mm , 

Using Lemma 15.51 and the above inequaUty, we conclude (fT4]l . ■ 



6 Approximate Support Recovery and Reweighted ii 

Using the analytical tools of this paper, it is possible to prove that a class of reweighted li minimization 
algorithms have a strictly higher recovery thresholds for sparse signals whose nonzero entries follow certain 
classes of distributions (e.g. Gaussian). The technical details of this claim is not brought here, since it 
stands beyond the scope of this paper. However, we briefly mention how a simple post processing on 
the output of ii minimization results in a nonuniform sparsity model with u = 2 classes close to the one 
we introduced for the unknown signal. A more comprehensive study on this can be found in [llj. The 
reweighted £i recovery algorithm proposed in [llj is composed of two steps. In the first step a standard 
ii minimization is done, and based on the output, a set of entries where the signal is likely to reside (the 
so-called approximate support) is identified. The unknown signal can thus be thought of as two classes, 
one with a relatively high fraction of nonzero entries, and one with a small fraction. The second step is 
a weighted ii minimization step where entries outside the approximate support set are penalized with a 
constant weight larger than 1. The algorithm is as follows: 

Algorithm 1. 
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1. Solve the ii minimization problem: 

X = argmin ||z||i subject to Az = Ax. (57) 



2. Obtain an approximation for the support set of x: find the index set L C {1,2, ...,n} which corre- 
sponds to the largest k elements of x in magnitude. 

3. Solve the following weighted l\ minimization problem and declare the solution as output: 

X* = argmin ||zi||i + tj||z-jj||i subject to Az = Ax. (58) 



For a given number of measurements, if the support size of x, namely k = \K\, is slightly larger than 
the sparsity threshold of £i minimization, then a so-called robustness of ii minimization helps find a 
lower bound /i for ^-^jj^, i.e. the sparsity fraction of x over the set L. If /i is sufficiently close to 1, the 
number of measurements could satisfy: 

6> , max sPi-,l--,f[,f!„u;). (59) 

(T) 

Then the recovery is successful in the second step with high probability. Recall that 5c is the sectional 
threshold, which accounts for all possible support sets. Therefore, the condition for strict improvement 
in the reweighted ii minimization is that: 

5(0,1,0,-,!) > , max §p - ^J[J^,u;). (60) 

7 Simulation Results 

We demonstrate by some examples that appropriate weights can boost the recovery percentage. In Figure 
H] we have shown the empirical recovery threshold of weighted £i minimization for different values of the 
weight oj = ^^7^, for two particular nonuniform sparse models. Note that the empirical threshold is 
somewhat identifiable with naked eye, and is very similar to the theoretical curve of Figure [2] for similar 
settings. In another experiment, we fix p2 and n = 2m = 200, and try ii and weighted ii minimization 
for various values of pi. We choose ni = n2 = Figure [5al shows one such comparison for p2 = 0.05 and 
diff'erent values of ■ Note that the optimal value of varies as pi changes. Figure [5b] illustrates 
how the optimal weighted £i minimization surpasses the ordinary ii minimization. The optimal curve is 
basically achieved by selecting the best weight of Figure [5a] for each single value of pi . Figure [6] shows the 
result of simulations in another setting where p2 = 0.1 and m = 0.75n (similar to the setting of Section 
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(a) 71 = 72 = 0.5, pi = 0.4 and p2 = 0.05. (b) 71 = 72 = 0.5, pi — 0.65 and p2 — 0.1. 

Figure 4: Empirical recovery percentage of weighed £1 minimization for different weight values oj, and different number of measure- 
ments S = ^ and n = 200. Signals have been selected from a nonuniform sparse models. White indicates perfect recovery.. 

HI . Note that these results very weU match the theoretical results of Figures [2a] and I2b[ 

In Figure [3 we have displayed the performance of weighted ii minimization in the presence of noise. 
The original signal is a nonuniformly sparse vector with sparsity fractions pi = 0A,p2 = 0.05 over two 
subclasses 71 = 72 = 0.5. However, a white Gaussian noise vector is added before compression. Figure [7] 
shows a scatter plot of all output signal to recovery error ratios as a function of the input SNR, for all 
simulations. In Figure [8] the average curves are compared together for different values of weight u. 

We have done some experiments with regular ii and weighted li minimization recovery on some real 
world data. We have chosen a pair of satellite images (Figure [9]) taken at two different years, 1989 (left) 
and 2000 (right), from the New Britain rainforest in Papua Guina. These images are generally recorded 
to evaluate environmental effects such as deforestation. The difference of images taken at different times 
is generally not very significant, and thus can be thought of as compressible. In addition, the difference is 
usually more substantial over certain areas, e.g. forests. Therefore, it can be cast in a nonuniform sparse 
model. We have applied ii minimization to recover the difference image over two subframes, identified 
by green and red rectangles in Figure [H In addition, a weighted ii minimization is also applied where 
the frame pixels are divided into two classes of equal sizes, where the concentration of the forestal area 
is larger over one of the classes, and hence the difference image is less sparse. For the right frame (red), 
the two classes are bottom half and top half of the frame, and for the left frame (green), they are left half 
and right half. We casually assign the weight value w = 2 for the sparser region for weighted ii recovery, 
and unitary weight to the denser region. The recovery errors for the two methods are displayed in Figure 
[TOl The error is averaged over 50 realizations of i.i.d. Gaussian measurement matrix for each 6. As can 
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(a) (b) 

Figure 5: Empirical probability of successful recovery for weighted £i minimization with different weights (unitary weight for the first 
subclass and uj for the other one) and suboptimal weights in a nonuniform sparse setting. p2 = 0.05, 71 = 72 = 0.5 and m = 0.5n = 100. 
Lo* is (b) is the optimum value of ui for each pi among the values shown in (a). 




Pi 



Fi gure 6: Empirical probability of successful recovery for different weights. p2 = 0.1, 71 = 72 = 0.5 and m = 0.75n = 150. 

be seen, even with this value of weight chosen intuitively, the recovery improvement is significant in the 
weighted ii minimization. 

In figure [TTl we have compared the recovery performance for the regular ii minimization and the 
reweighted £i minimization of Algorithm [H for different sparsity levels and different distributions for the 
nonzero entries. Here the signal dimension is n = 200, and the number of measurements is m = 112, which 
corresponds to a value of 5 = 0.5555. We generated random sparse signals with iid entries coming from 
certain distributions; Gaussian, uniform, Rayleigh , square root of x-square with 4 degrees of freedom and, 
square root of x-square with 6 degrees of freedom. Solid lines represent the simulation results for ordinary 
ii minimization, and different colors indicate different distributions. Dashed lines are used to show the 
results for Algorithm [TJ The reason why these distributions are selected and compared is elaborated in 
|llj . as they demonstrate various levels of improvement. Note that for Gaussian and uniform distributions 
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(c) uj = 5. 

Figure 7l Signal to recovery error ratio for weighted £i minimization with weight ui vs. input SNR for nonuniform sparse signals 
with 71 = 72 = 0.5, pi = 0.4, p2 = 0.05 superimposed with Gaussian noise. 

that are flat and nonzero at the origin, the reweighted algorithm shows an impressive more than 20% 
improvement in the weak threshold (from 45 to 55). 

8 Conclusion and Future Work 

We analyzed the performance of the weighted ii minimization for nonuniform sparse models. We com- 
puted explicitly the phase transition curves for the weighted li minimization, and showed that with 
proper weighting, the recovery threshold for weighted ii minimization can be higher than that of regular 
li minimization. We provided simulation results to verify this both in the noiseless and noisy situation. 
Some of our simulations were performed on real world data of satellite images, where the nonuniform 
sparse model is a valid assumption. A further interesting question to be addressed in future work would be 
to characterize the gain in recovery percentage as a function of the number of distinguishable classes u in 
the nonuniform model. In addition, we have used the results of this paper to build iterative reweighted ii 
minimization algorithms that are provably strictly better than ii minimization, when the nonzero entries 
of the sparse signals are known to come from certain distributions (in particular Gaussian distributions) 
[m [18]. The basic idea there is that a simple post processing procedure on the output of ii minimization 
results, with high probability, in a hypothetical nonuniform sparsity model for the unknown signal, which 
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Figure 8: Average signal to recovery error ratio for weighted £i minimization with weight uj vs. input SNR for nonuniform sparse 
signals with 71 = 72 = 0.5, pi = 0.4, p2 = 0.05 superimposed with Gaussian noise. 




Figure 9: Satellite images taken from the New Britain rainforest in Papua Guina at 1989 (left) and 2000 (right). Image originally 
belongs to Royal Society for the Protection of Birds and was taken from the Guardian archive, an article on deforestation.. 

can be exploited for improved recovery. 
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Appendix. Proof of Important Lemmas 
A Proof of Lemma 15.11 

First, let us assume that J2ieK — SieT?^*!"^*!'^^ ^ M{A). Note that by assumption WiS are all 

nonnegative. Using the triangular inequality for the weighted ii norm (or for each absolute value term 
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on the LHS) we obtain 



'^Wi\xi + Zi\ + '^wi\zi\ > y^^wi\xi\ - y^^wi\zi\ + y^^wi\zi\ 

i£K i^K ieK i&K i^K 

> y^^wi\xi\. 

thus proving the forward part of this lemma. Now let us assume instead that 3z £ A^(^), such that 
X^ieA' ^« 1-^*1 ^ X^iGX ""^il-^*!- Then we can construct a vector x supported on the set K (or a subset of 
K), with x/f = — Zft- (i.e. Xi = —Zi Vi G K). Then we have 

Wi\xi + Zi\ + '^Wi\zi\ = + '^Wi\zi\ < y^^Wi\xi\ 

i£K i&K 

proving the reverse part of this lemma. 

B Proof of Lemma 15.21 

Without loss of generality, assume that has the following k vertices: {ff-, 1 < r < k}, where is 
the n-dimensional standard unit vector with the r-th element equal to 1. Also assume that the (/ — 1)- 
dimensional face G is the convex hull of the following / vertices: <?'</. Then the cone Cj^g 

formed by observing the (/ — l)-dimensional face G of the weighted £i-ball "Pw from an interior point x-^ 
of the face is the positive cone of the vectors: 

-, for all j G J\K, i G K, (61) 



and also the vectors 



for all ii eK, K, (62) 



where L = {1,2,...,/} is the support set for the face Q. So the cone Cj^^g is the direct sum of the linear 
hull Cjr = lin{J^ — x-^} formed by the vectors in (j62]) and the cone Cjr± Q = Cf,g fl where Cp is the 
orthogonal complement to the linear subspace Cj^. Then Cjr± g has the same (relative) spherical volume 
as Cjrg, and by definition the internal angle /3{J^,G) is the relative spherical volume of the cone Cj^^g. 
Now let us analyze the structure of Cjr± g. We notice that the vector eg = Yl^^iWrGr is in the linear 
space C-^ and is also the only such a vector (up to linear scaling) supported on K. Thus a vector x in 
the positive cone Cjr± g must take the form 

k I 

-'^biCi + ^ bid, (63) 

i=l i=k+l 
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where bi,l < i < I are nonnegative real numbers and 

E, , ^1 b2 bk 
Wibi = } Wibi = = . . . = — . 

Wu', WKo Wk 
i=X i=k+l Ai _H.2 « 

Now that we have identified Cjr± g we try to calculate its relative spherical volume with respect to 
the sphere surface 5'"'^^^ to derive (3{J^,Q). First, we notice that Cjr± g is a (/ — /c)-dimensional cone. 
Also, all the vectors (xi, • • • , x„) in the cone Cjr± g take the form in ([63|) . From [22] . 

JC^±g Jo 

where Vi-k^i{S''^^^^) is the spherical volume of the {I — k — l)-dimensional sphere S^~^~^ and is given 
by the well-known formula 

where r(-) is the usual Gamma function. This completes the proof. 

C Proof of Lemma 15.31 

Let G denote the cumulative distribution function of a half-normal HN{0, 1/2) random variable, i.e. a 
random variable X = \Z\ where Z ~ N(0, 1/2), and G{x) = ¥{X < x}. Since X has density function 
9{x) = exp(— x^), we know that 

G{x) = 4- r e-y"dy- (64) 



'T^ Jo 

and so G is just the classical error function erf(- ). We now justify the external angle exponent computa- 
tions in Theorem 14.31 and Lemma 15.31 using Laplace methods [3\. Using the same set of notations as in 
Theorem 14.31 let ti = Tin, t2 = T2n. Also define c = (ri +jipi) -|-cj^(t2 + ^2P2), cti = 7i(l — Pi) — ti and 
"2 = 72(1 — P2) — T2. Let xq be the unique solution to x of the following: 

gix)ai u}g{ujx)a2 _ . 
^G(^y xGicox) ^^^^ 



Since xG{x) is a smooth strictly increasing function ( ~ as x — )■ and ~ x as x — ?• 00), and g{x) is 
strictly decreasing, the function ^^q^^ + '^^Gii^x)^ one-one on the positive axis, and xq is a well-defined 
function of ri and T2- Hence, we denote it as xo(Ti,r2). Then 

'>PextiTi,T2) = cxq - oi log G(xo) - a2logG(wxo). (66) 
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To prove Lemma 15.31 we start from the explicit integral formula 



oo 



C(di,d2) = ^-^2"-'/ e-^U r''''''\-y'dy] r'''^''\-y' dy] dx, (67) 



^0 \J0 ) \J0 ) 

After a changing of integral variables (Noticing that wk^ = 1) '^K^ = '^i ^^n'^^ = c^i) s-iid "^'"'^ = 02 )) 
we have 

/■oo 

Jo 

This suggests that we should use Laplace's method; we define 

/n,...n = e-"^^'i4^^)-yW^ (69) 

with 

'4^t'^,t'^{y) = cy^ - ci log G{y) - 02 log G{ujy) 

We note that the function 'i(^t{,t',^ is smooth and convex. Applying Laplace's method to V't'^.t^) but taking 
care about regularity conditions and remainders as in [3J, gives a result with the uniformity in {t'^,t2). 

Lemma C.l. For t'l^t^, let xq{ti,T2) denote the minimizer ofipf^^t'^. Then 







where for any 6,r] > 0, 

sup Rn{ti,t'2) = 0(1) as n ^ 00. 

0<4<7i-Pli0<r2<(72-p2),5-pi-p2<Ti+r2<(l-pi-p2-T?) 

where pi = ki/n, P2 = k2/n, ui/n = 71 and n2/n = 72. 

In fact, in this lemma, the minimizer XQ{t'^,t'2) is exactly the same XQ{t'^,t'2) defined earlier in (|65|) and 
the corresponding minimum value is the same as the defined exponent ipext'- 

i^extit'iA) = A[,t'2{Xt[ ,Xt'^). (70) 

We can derive Lemma E3] from Lemma [C. 11 We note that as t'^ + ^2 + 7i + 72 — ^ Ij 2:0(^1)^2) ~^ aiid 
fpext{t'i,t'2) — 0. For given e > in the statement of Lemma ESI there is a largest r]^ < 1 such that as 
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long as n + r2 + pi + /92 > f/e, ipext{t'i , t'2) < e. Note that C(^, V^) < 1, so that for n + r2 + pi + /32 > 

log(C(ii + ki,t2 + k2)) < < -V'ext(t'i, is) + e, 

for n > 1. Applying the uniformity in t'^,t'2 given in Lemma IC. 11 we have as n ^ 00, uniformly over the 
feasible region for t\ , , 

n-^\og{(:{tl + ki,t2 + k2)) < -^Pext{t[,t'2) + 0{1). (71) 

Then Lemma 15.31 follows. 



D Proof of Lemma 15.41 

Recall Theorem 15.11 By applying the large deviation techniques as in [3] , we have 







where is the same as defined in Section [5. 2. H wki = 1, = w, m' = ti +t2, Pm' = (*i +t20j)y is 
the expectation of -^(w^^ Z^iLi -^i ~ '^'k2 Si=i -^i)^ i-^i ^^"-^ -^i defined as in Theorem 15. ip . and 

k*{y) = max sy - ^] Ai(g) - A2(g), 



with 



s2 



Ai(s) = y + log(2$(s)), K2{s) = Ki{ujs). 



In fact, the second term in the sum can be argued to be negligible [3]. After a changing of variables 
y = ^^P'V, we know that the first term of ([72]) is upper-bounded by 



A . 1 . ^ . / '^""^,e-'(i^)^^-™'^*(^) dy. (73) 
Vtt 2i2 Jq 

As we know, m' in the exponent of (j73p is ti + t2- Similar to evaluating the external angle decay 
exponent, we will resort to the Laplace's method in evaluating the internal angle decay exponent. 
Define the function 

/,,,,(y)=,e-'(i^)^^-'^*(^). 

If we apply similar arguments as in proving Lemma IC.ll and take care of the uniformity, we have the 
following lemma: 
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Lemma D.l. Let ytjfy* denotes the minimizer of {^)y'^ + A*{y). Then 



oo 







where for rj > 



This means that 



m' suplog(i?m'(ii)*2)) = 0(1) as m! — 00. 



where m'~^ sup (ti+t^) log{Rm'{ti,t2)) = o(l) as m' — > 00. 

Now in order to find a lower bound on the decay exponent for p^(0), (ultimately the decay exponent 
V'mt(''"i) ■^2))) we need to focus on finding the minimizer yti,t2* for {^)y'^ + On this way, by setting 

the derivative of (fn)y^ + with respect to y to 0, and also noting the derivative A*'{y) = s, we have 



m' 



At the same time, the s maximizing A*{y) must satisfy 



y = -^A[is) + -^A',is), (75) 

tl +12 11+ 12 



namely, (by writing out 

y='-^r^s + Qis), (76) 
tl + 12 

where Q{s) is defined as in Theorem 15.11 By combining (|74p and (j75p . we can solve for the s and y, thus 
resulting in the decay exponent for ipint{Ti,T2) as calculated in Theorem 15.11 

E Proof of Theorem 14.41 

Let 6' = 5c(7i5 72; 1,^2; w) and 5" = 5c{0,l,0,p2,l)- Prom Theorem 14.31 we know that: 

6' = mm{6 \ 'i/'com(0, T2) - V'Lt(0, ^2) - V'extlO, ra) < V < ra < 72(1 - P2) 
,T2 > (5 - 71 - 72P2}, 
= 72 mm{6 I V'com(0, 72T-2) - V'Lt (0, 72^2) - V-Lt (0, 72^2) <0V0<T2<l-p2 

,T2 > 5 - P2} + 71, (77) 
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and 

5" = min{5|Cn^(0,T2)-C^(0,r2)-Ct(0,r2)<0V0<r2<l-p2 

,T2>6-P2}, (78) 

where the exponents '^'comi'^'inf''^'ext^'^com^'^int ^'^'^ '^ext c^i^ be found usmg Theorem 14. 3 1 Here, we basically 
show that when a; — )• oo: 

V'com (0,72T2) = 72^com(0,T2), (79) 

Ct(0,72T2) = 72Ct(0,r2), (80) 

V'Li(0,72T2) = 72 V-ext (0,7-2). (81) 

(j79]) follows immediately from the definition of V'com in dS])- On the other hand, from ([7]), for w — )• we 
know that 

V'ea;t(0> 72T2) = C x'q - 01^ log G(wXo), 
V'Lt(0, T2) = CXq - 02 log G(Xo)- 

Following the details of derivations as in Theorem 14. 3|, we realize that: 

c = 72W c , u)Xq = Xq, a2 = 7202, (82) 

which implies that 'tp'exti^-^l'^T'i) = 72V'ea;t(0, ''"2)- Finally, from ([8]), we know that 

Ct(0,72r2) = (A*(y') + +log2)72r2, 
Ct(0,r2) = (A*(/) + ^log2)r2. 

Following the details of derivations as in Theorem 14.3^ we realize that for a; — >• cx): 

y' = y'\ n' = -12^". (83) 

which implies that V'i„t(0, 72T2) = 72V'rnt(0, ^2)- From ([78]) and (I79])-(l8l]) it follows that 

<5' = 725" + 7i- (84) 
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